Data Collection and Transliteration of Japanese Spontaneous Database in the Travel Arrangement Task Domain
نویسندگان
چکیده
This paper describes the method to construct and transcribe Japanese spontaneous speech data for VERBMOBIL, the German research project of speech translation.. Spontaneous spoken dialogue database is the basis for developing speech and language processing for dialogue systems such as speech translation system. The extended data of human-to-human spoken dialogue in the scenario of travel arrangement has been initiated to be collected in German, English and Japanese in the travel arrangement task. Romanized transcription is used to develop acoustic model and language model in speech recognition system, and natural language translation system. In this paper, issues of transliteration method and several rules and conventions to transcribe Japanese spoken dialogue will be described.
منابع مشابه
Language model selection based on the analysis of Japanese spontaneous speech on travel arrangement task
This paper deals with the issue of language model selection based on the analysis of data collection for spontaneous speech in Japanese in the travel arrangement task which contains five different subtasks. The procedure of transcription and segmentation of the Japanese spontaneous speech in Romanized transcription is described. The use of topic-dependent separated language model were evaluated...
متن کاملAn interlingua based on domain actions for machine translation of task-oriented dialogues
This paper describes an interlingua for spoken language translation that is based on domain actions in the travel planning domain. Domain actions are composed of speech acts (e.g., requestinformation), attributes (e.g., size, price), and objects (e.g., hotel, flight) and can take arguments. Development of the interlingua is guided by a database containing travel dialogues in English, Korean, Ja...
متن کاملIdentification of utterance intention in Japanese spontaneous spoken dialogue by use of prosody and keyword information
This paper describes the study on the identification of utterance intention in Japanese spontaneous dialogue. The procedure of tagging the dialog act which was labeled by hand was evaluated by the analysis of the prosodic information and keyword recognition for the dialogues of scheduling and travel arrangement domains. It was shown that the integration of prosody and keywords relevant to illoc...
متن کاملRecognition and Transliteration of Proper Nouns in Cross-Language Record Linkage by Constructing Transliterated Word Pairs
Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. To improve the recognition of proper nouns in metadata and obtain their transliterations, we propose a method to construct bilingual transliteration word pairs, in which transliterated words in target language are back-transliterated to their original words in sourc...
متن کاملJapanese spontaneous speech database with wide regional and age distribution
This paper introduces a Japanese spontaneous speech database of 3,771 speakers with wide regional and age distributions. This database is designed to capture Japanese spontaneous speech characteristics and is used to develop a speaker-independent (SI) speech recognition system. This paper describes the data collection and transcription. Moreover, we show preliminary analyses through SI speech r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999